Psychological Review — Latest Matching Preprints

1

Further perceptions of probability: in defence of trial-by-trial updating models

Forsgren, M.; Juslin, P.; van den Berg, R.

2020-01-31 animal behavior and cognition 10.1101/2020.01.30.927558 medRxiv

Top 0.1%

52.8%

Show abstract

Extensive research in the behavioural sciences has addressed peoples ability to learn stationary probabilities, which stay constant over time, but only recently have there been attempts to model the cognitive processes whereby people learn - and track - non-stationary probabilities. In this context, the old debate on whether learning occurs by gradual formation of associations or by occasional shifts between hypotheses representing beliefs about distal states of the world has resurfaced. Gallistel et al. (2014) pitched the two theories against each other in a non-stationary probability learning task. They concluded that various qualitative patterns in their data were incompatible with trial-by-trial associative learning and could only be explained by a hypothesis-testing model. Here, we contest that claim and demonstrate that it was premature. First, we argue that their experimental paradigm consisted of two distinct tasks: probability tracking (an estimation task) and change detection (a decision-making task). Next, we present a model that uses the (associative) delta learning rule for the probability tracking task and bounded evidence accumulation for the change-detection task. We find that this combination of two highly established theories accounts well for all qualitative phenomena and outperforms the alternative model proposed by Gallistel et al. in a quantitative model comparison. In the spirit of cumulative science, we conclude that current experimental data on human learning of non-stationary probabilities can be explained as a combination of associative learning and bounded evidence accumulation and does not require a new model.

2

Uncertainty is Maintained and Used in Working Memory

Yoo, A. H.; Acerbi, L.; Ma, W. J.

2020-10-08 neuroscience 10.1101/2020.10.06.328310 medRxiv

Top 0.1%

21.9%

Show abstract

1What are the contents of working memory? In both behavioral and neural computational models, a working memory representation is typically described by a single number, namely a point estimate of a stimulus. Here, we asked if people also maintain the uncertainty associated with a memory, and if people use this uncertainty in subsequent decisions. We collected data in a two-condition orientation change detection task; while both conditions measured whether people used memory uncertainty, only one required maintaining it. For each condition, we compared an optimal Bayesian observer model, in which the observer uses an accurate representation of uncertainty in their decision, to one in which the observer does not. We find that this "Use Uncertainty" model fits better for all participants in both conditions. In the first condition, this result suggests that people use uncertainty optimally in a working memory task when that uncertainty information is available at the time of decision, confirming earlier results. Critically, the results of the second condition suggest that this uncertainty information was maintained in working memory. We test model variants and find that our conclusions do not depend on our assumptions about the observers encoding process, inference process, or decision rule. Our results provide evidence that people have uncertainty that reflects their memory precision on an item-specific level, maintain this information over a working memory delay, and use it implicitly in a way consistent with an optimal observer. These results challenge existing computational models of working memory to update their frameworks to represent uncertainty.

3

The shadowing effect of initial expectation on learning asymmetry

Sun, J.; Ni, Y.; Li, J.

2022-11-22 neuroscience 10.1101/2022.11.22.517473 medRxiv

Top 0.1%

18.8%

Show abstract

Evidence for positivity and optimism bias abounds in high-level belief updates. However, no consensus has been reached regarding whether learning asymmetries exists in more elementary forms of updates such as reinforcement learning (RL). In RL, the learning asymmetry concerns the sensitivity difference in incorporating positive and negative prediction errors (PE) into value estimation, namely the asymmetry of learning rates associated with positive and negative PEs. Although RL has been established as a canonical framework in interpreting agent and environment interactions, the direction of the learning rate asymmetry remains controversial. Here, we propose that part of the controversy stems from the fact that people may have different value expectations before entering the learning environment. Such default value expectation influences how PEs are calculated and consequently biases subjects choices. We test this hypothesis in two learning experiments with stable or varying reinforcement probabilities, across monetary gains, losses and gain-loss mixtures environments. Our results consistently support the model incorporating asymmetric learning rates and initial value expectation, highlighting the role of initial expectation in value update and choice preference. Further simulation and model parameter recovery analyses confirm the unique contribution of initial value expectation in accessing learning rate asymmetry. Author SummaryWhile RL model has long been applied in modeling learning behavior, where value update stands in the core of the learning process, it remains controversial whether and how learning is biased when updating from positive and negative PEs. Here, through model comparison, simulation and recovery analyses, we show that accurate identification of learning asymmetry is contingent on taking into account of subjects default value expectation in both monetary gain and loss environments. Our results stress the importance of initial expectation specification, especially in studies investigating learning asymmetry.

4

Modelling Speed-Accuracy Tradeoffs in the Stopping Rule for Confidence Judgments

Herregods, S.; Le Denmat, P.; Desender, K.

2023-02-28 neuroscience 10.1101/2023.02.27.530208 medRxiv

Top 0.1%

17.4%

Show abstract

Making a decision and reporting confidence in the accuracy of that decision are thought to be driven by the same mechanism: the accumulation of evidence. Previous research has shown that choices and reaction times are well accounted for by a computational model assuming noisy accumulation of evidence until crossing a decision boundary (e.g., the drift diffusion model). Decision confidence can be derived from the amount of evidence following post-decision evidence accumulation. Currently, the stopping rule for post-decision evidence accumulation is underspecified. In the current work, we quantitatively and qualitatively compare the ability of four prominent models of confidence couched within evidence accumulation to account for this stopping rule. In two experiments, participants were instructed to make fast or accurate decisions, and to give fast or carefully considered confidence judgments. We then compared the different models in their ability to capture the speed-accuracy effects on confidence. Both qualitatively and quantitatively, the data were best accounted for by our newly proposed Flexible Collapsing Boundaries model, in which post-decision accumulation terminates once it reaches one of two opposing slowly collapsing confidence boundaries. Inspection of the parameters of this model revealed that instructing participants to make fast versus accurate decisions influenced the height of the decision boundaries, while instructing participants to make fast versus careful confidence judgments influenced height of the confidence boundaries. Our data show that the stopping rule for confidence judgments can be well described as an accumulation-to-bound process, and that the height of these confidence boundaries are under strategic control.

5

Contrasting Probabilistic and Intentional Accounts of Confidence in Perceptual Decisions

Zylberberg, A.

2026-03-30 animal behavior and cognition 10.64898/2026.03.24.714055 medRxiv

Top 0.1%

17.1%

Show abstract

The ability to evaluate ones own knowledge states is often studied using paradigms in which participants make a decision and subsequently report their confidence. This structure has motivated hierarchical models in which confidence arises from a metacognitive process, distinct from the decision process itself, that estimates the probability that the choice is correct (Meyniel et al., 2015; Pouget et al., 2016; Fleming and Daw, 2017). Here, we contrast this framework with an alternative based on an intentional architecture (Shadlen et al., 2008). In this account, choice and confidence are determined simultaneously through a multidimensional drift-diffusion process, where each dimension represents one choice-confidence combination (Ratcliff and Starns, 2009, 2013). Choice, response time, and confidence jointly emerge when one of these accumulators reaches a decision bound. To adjudicate between these accounts, we fit both models to behavioral data from two perceptual tasks: a random-dots motion discrimination task with incentivized confidence reports, and a luminance discrimination task without feedback or incentives. The integrated model provided a superior fit for the incentivized motion task, whereas the hierarchical model more accurately captured behavior in the un-incentivized luminance task. These results suggest that confidence does not rely on a single computational mechanism, but rather its implementation may adapt to the specific demands and structure of the task.

6

Plucking a string or playing a G? Degree of choice abstraction impacts human reinforcement learning

Rmus, M.; Zou, A.; Collins, A. G. E.

2021-08-27 animal behavior and cognition 10.1101/2021.08.25.457707 medRxiv

Top 0.1%

16.4%

Show abstract

In reinforcement learning (RL) experiments, participants learn to make rewarding choices in response to different stimuli; RL models use outcomes to estimate stimulus-response values which change incrementally. RL models consider any response type indiscriminately, ranging from more concretely defined motor choices (e.g. pressing a key with the index finger), to more general choices that can be executed in a number of ways (e.g. getting dinner at the restaurant). But does the learning process vary as a function of the choice type? In Experiment 1, we show that it does: participants were slower and less accurate in learning correct choices of a general format compared to learning more concrete, motor actions. Using computational modeling, we show that two mechanisms contribute to this. First, there was evidence of irrelevant credit assignment: the values of motor actions interfered with the values of other choice dimensions, resulting in more incorrect choices when the correct response is not defined by a single motor action; second, information integration for relevant general choices was slower. In Experiment 2, we replicated and further extended the findings from Experiment 1, by showing that slowed learning was attributable to weaker working memory use, rather than slowed RL learning. In both experiments we ruled out the explanation that the difference in performance between two condition types was driven by difficulty/different levels of complexity. We conclude that defining a more abstract choice space used by multiple learning systems for credit assignment recruits executive resources, limiting how much such processes then contribute to fast learning.

7

Correlation Detection with and without the Theories of Conditionals: A model update of Hattori & Oaksford (2007)

Takahashi, T.; Oyo, K.; Tamatsukuri, A.; Higuchi, K.

2019-11-19 animal behavior and cognition 10.1101/247742 medRxiv

Top 0.1%

15.6%

Show abstract

We view observational causal induction as a statistical independence test under rarity assumption. This paper complements the two-stage theory of causal induction proposed by Hattori and Oaksford (2007) with a computational analysis. We show that their dual-factor heuristic (DFH) model has a rational account as the square root of the index of (non-)independence under extreme rarity assumption, contrary to the criticism that the DFH model is non-normative (e.g., Lu et al., 2008). We introduce a model that considers the proportion of assumed-to-be rare instances (pARIs), which is the probability of biconditionals (according to several theories of compound conditionals) and can be seen as a simplified version of the DFH model. While being a single conditional probability, pARIs approximates the non-independence measure, the square of DFH. In reproducing the meta-analysis in Hattori and Oaksford (2007), we confirm that pARIs and DFH have the same level of descriptive adequacy, and that the two models have the highest fit among more than 40 models. Then, we critically examine the computer simulations which were central to the rational analysis in Hattori and Oaksford (2007). We point out two problems in their simluations: samples in some of the simulations being restricted to generative ones, and in-definite values of models because of the small samples. In the light of especially the latter problem of definability, pARIs shows higher applicability.

8

Generalizing to generalize: humans flexibly switch between compositional and conjunctive structures during reinforcement learning

Franklin, N. T.; Frank, M. J.

2019-07-18 neuroscience 10.1101/547406 medRxiv

Top 0.1%

14.2%

Show abstract

Humans routinely face novel environments in which they have to generalize in order toact adaptively. However, doing so involves the non-trivial challenge of deciding which aspects of a task domain to generalize. While it is sometimes appropriate to simply re-use a learned behavior, often adaptive generalization entails recombining distinct components of knowledge acquired across multiple contexts. Theoretical work has suggested a computational trade-off in which it can be more or less useful to learn and generalize aspects of task structure jointly or compositionally, depending on previous task statistics, but empirical studies are lacking. Here we develop a series of navigation tasks which manipulate the statistics of goal values ("what to do") and state transitions ("how to do it") across contexts, and assess whether human subjects generalize these task components separately or conjunctively. We find that human generalization is sensitive to the statistics of the previously experienced task domain, favoring compositional or conjunctive generalization when the task statistics are indicative of such structures, and a mixture of the two when they are more ambiguous. These results support the predictions of a normative "meta-generalization learning" agent that does not only generalize previous knowledge but also generalizes the statistical structure most likely to support generalization. Author NoteThis work was supported in part by the National Science Foundation Proposal 1460604 "How Prefrontal Cortex Augments Reinforcement Learning" to MJF. We thank Mark Ho for providing code used in the behavioral task. We thank Matt Nassar for helpful discussions. Correspondence should be addressed to Nicholas T. Franklin (nfranklin@fas.harvard.edu) or Michael J. Frank (michael_frank@brown.edu).

9

Predicting continuous outcomes: Some new tests of associative approaches to contingency learning.

Chow, J.; Don, H. J.; Colagiuri, B.; Livesey, E. J.

2025-06-15 animal behavior and cognition 10.1101/2025.06.12.659290 medRxiv

Top 0.1%

12.5%

Show abstract

Associative learning models have traditionally simplified contingency learning by relying on binary classification of cues and outcomes, such as administering a medical treatment (or not) and observing whether the patient recovered (or not). While successful in capturing fundamental learning phenomena across human and animal studies, these models are not capable of representing variability in human experiences that are common in many real-world contexts. Indeed, where variation in outcome magnitude exists (e.g., severity of illness in a medical scenario), this class of models, at best, approximate the outcome mean with no ability to represent the underlying distribution of values. In this paper, we introduce one approach to incorporating a distributed architecture into a prediction error learning model that tracks the contingency between cues and dimensional outcomes. Our Distributed Model allows associative links to form between the cue and outcome nodes that provide distributed representation depending on the magnitude of the outcome, thus enabling learning that extends beyond approximating the mean. Comparing the Distributed Model against a Simple Delta Model across four contingency learning experiments, we found that the Distributed Model provides significantly better fit to empirical data in virtually all participants. These findings suggest human learners rely on a means of encoding outcomes that preserves the continuous nature of experienced events, advancing our understanding of causal inference in complex environments. Author SummaryWhen we learn about cause and effect in everyday life--such as whether a medicine helps recovery from illness--we experience outcomes that vary in degree rather than simply happening or not happening. Traditional models of how humans and animals learn have largely focused on these all-or-nothing scenarios, essentially tracking the average value when outcomes are dimensional. We developed a model that extends on simple error-correction models to represent how people learn about relationships between cues and outcomes that can take on a range of values. Instead of just tracking the average, our Distributed Model captures the full spectrum of possible outcomes and their frequencies. We tested this model against a conventional single point-estimate approach across four experiments and found that our Distributed Model better matched how people make predictions in nearly every case. Our findings suggest that a relatively simple adjustment to conventional prediction-error learning algorithms that allows representation of outcome magnitudes provide a powerful way to capture the information that we preserve when we learn about variable outcomes. This has important implications for understanding how people make predictions and decisions in real-world situations where outcomes naturally vary, from medical treatments to environmental changes.

10

Temporal Preparation Drives Statistical Learning Effects

Salet, J. M.; Kruijne, W.; Rijn, H. v.; Zimmermann, E.; Schlichting, N.

2022-12-22 neuroscience 10.1101/2022.08.06.502990 medRxiv

Top 0.1%

12.1%

Show abstract

Timing studies on statistical learning predominantly present temporal regularities in a discrete, trial-by-trial manner. This, however, is a simplified representation of natures complex temporal structure in which regularities are embedded in a continuous stream of interrupting, irregular events. Recent studies using more complex, dynamic experimental tasks have nevertheless replicate the ubiquitous finding of a performance benefit for regular versus irregular timed events. This regularity benefit is commonly interpreted to emerge from a mechanism that detects and subsequently exploits the regularity as it would be impossible for irregular timed events. In contrast, here we show that this interpretation overlooks the influence of learning various time-related factors that contribute to the regularity benefit in such complex temporal environments. Instead of only learning the temporal regularity, participants exploited the temporal properties of targets that we initially considered as irregular and uninformative. Participants temporally prepared, in parallel, for different actions to perform and locations to attend, irrespective of targets regularity. In fact, this temporal preparation explained regularity benefits without assuming statistical learning of the regular target. Using a computational model, f MTP, we illustrate that such adaptation can arise from associative memory processes underlying temporal preparation.

11

Criterial Learning and Feedback Delay: Insights from Computational Models and Behavioral Experiments

Crossley, M. J.; Pelzer, B. O.; Ashby, F. G.

2024-11-18 neuroscience 10.1101/2024.11.16.623975 medRxiv

Top 0.1%

10.2%

Show abstract

The notion of a response criterion is ubiquitous in psychology, yet its cognitive and neural underpinnings remain poorly understood. To address this shortcoming, three computational models that capture different hypotheses about criterial learning were developed and tested. The time-dependent drift model assumes the criterion is stored in working memory and that its value drifts over time. The delay-sensitive learning model assumes that the magnitude of criterial learning is temporally discounted by feedback delay. The reinforcement-learning model assumes that criterial learning emerges from stimulus-response association learning without an explicit representation of the criterion, with learning rate also temporally discounted by feedback delay. The performance of these models was investigated under varying feedback delay and intertrial interval (ITI) durations. The time-dependent drift model predicted that long ITIs and feedback delays both impair criterial learning. In contrast, the delay-sensitive and reinforcement-learning models predicted impairments only with feedback delays. Two behavioral experiments, which tested these predictions, showed that human criterial learning is impaired by delayed feedback but not by long ITIs. These results support the delay-sensitive and reinforcement-learning models, and suggest that even in tasks that appear to rely on explicit, rule-based reasoning, criterial learning may have strong associative underpinnings.

12

Using influence measures to test normative use of probability density information derived from a sample

Ota, K.; Maloney, L.

2023-02-05 animal behavior and cognition 10.1101/2023.02.05.527165 medRxiv

Top 0.1%

10.1%

Show abstract

Bayesian decision theory (BDT) is frequently used to model normative performance in perceptual, motor, and cognitive decision tasks where the outcome of each trial is a reward or penalty that depends on the subjects actions. The resulting normative models specify how decision makers should encode and use information about uncertainty and value - step by step - in order to maximize their expected reward. When prior, likelihood, and posterior are probabilities, the Bayesian computation requires only simple arithmetic operations: addition, etc. We focus on visual cognitive tasks where Bayesian computations are carried out not on probabilities but on (1) probability density functions and (2) these probability density functions are derived from samples. We break the BDT model into a serries of computations and test human ability to carry out each of these computations in isolation. We test three necessary properties of normative use of pdf information derived from a sample - accuracy, additivity and influence. Influence measures allows us to assess how much weight each point in the sample is assigned in making decisions and allows us to compare normative use (weighting) of samples to actual, point by point. We find that human decision makers violate accuracy and additivity systematically but that the cost of failure in accuracy or additivity would be minor in common decision tasks. However, a comparison of measured influence for each sample point with normative influence measures demonstrates that the individuals use of sample information is markedly different from the predictions of BDT. We demonstrate that the normative BDT model takes into account the geometric symmetries of the pdf while the human decision maker does not. A heuristic model basing decisions on a single extreme sample point provided a better account for participants data than the normative BDT model. Author SummaryBayesian decision theory (BDT) is used to model human performance in tasks where the decision maker must compensate for uncertainty in order to to gain rewards and avoid losses. BDT prescribes how the decision maker can combine available data, prior knowledge, and value to reach a decision maximizing expected winnings. Do human decision makers actually use BDT in making decisions? Researchers typically compare overall human performance (total winnings) to the predictions of BDT but we cannot conclude that BDT is an adequate model for human performance based on just overall performance. We break BDT down into elementary operations and test human ability to execute such operations. In two of the tests human performance deviated only slightly (but systematically) from the predictions of BDT. In the third test we use a novel method to measure the influence of each sample point provided to the human decision maker and compare it to the influence predicted by BDT. When we look at what human decision makers do - in detail - we find that they use sensory information very differently from what the normative BDT observer does. We advance an alternative non-Bayesian model that better predicts human performance.

13

A Reinforcement Learning and Sequential Sampling Model Constrained by Gaze Data

Hayes, W. M.; Touchard, M. J.

2025-09-01 animal behavior and cognition 10.1101/2025.08.27.672620 medRxiv

Top 0.1%

10.0%

Show abstract

Reinforcement learning models can be combined with sequential sampling models to fit choice-RT data. The combined models, known as RL-SSMs, explain a wide range of choice-RT patterns in repeated decision tasks. The present study shows how constraining an RL-SSM with eye gaze data can further enhance its predictive ability. Our model assumes that learned option values and relative gaze independently influence the accumulation of evidence prior to choice. We evaluated the model on data from two eye-tracking experiments (total N = 133) and find that it makes better out-of-sample predictions than other models with different ways of integrating values and gaze at the decision stage. Further, we show that it captures a variety of empirical effects, including the finding that choices become more accurate as the higher-value option receives a greater proportion of the total fixation time. The model can be used to understand how learned option values interact with visual attention to influence choice, joining together two major--but mostly separate--research traditions in the cognitive science of decision making.

14

A bounded accumulation model of temporal generalization outperforms existing models and captures modality differences and learning effects

Ofir, N.; Landau, A. N.

2024-10-17 neuroscience 10.1101/2024.10.15.616846 medRxiv

Top 0.1%

9.9%

Show abstract

Multiple systems in the brain track the passage of time and can adapt their activity to temporal requirements (Paton & Buonomano, 2018). While the neural implementation of timing varies widely between neural substrates and behavioral tasks, at the algorithmic level many of these behaviors can be described as bounded accumulation (Balc & Simen, 2024). So far, from the range of temporal psychophysical tasks, the bounded accumulation model has only been applied to temporal bisection, in which participants are requested to categorize an interval as "long" or "short" (Balc & Simen, 2014; Ofir & Landau, 2022). In this work, we extend the model to fit performance in the temporal generalization task, in which participants are required to categorize an interval as being the same or different compared to a standard, or reference, duration (Wearden, 1992). Previous models of performance in this task focused on either the group level or performance of highly trained animals (Birngruber et al., 2014; Church & Gibbon, 1982; Wearden, 1992). Whether the same models can fit performance from a few hundreds of trials of single participants, necessary for comparing performance across experimental manipulations, has not been tested. A drift-diffusion model with two decision boundaries fits the data of single participants better than the previous models. We ran two experiments, one comparing performance between vision and audition and another examining the effect of learning. We found that decision boundaries can be modified independently: While the upper boundary was higher in vision compared to audition, the lower boundary decreased with learning in the task.

15

Behavioral Signatures of Post-Decisional Attention in Preferential Choice

Zylberberg, A.; Krajbich, I.; Shadlen, M. N.

2026-01-12 animal behavior and cognition 10.64898/2026.01.10.698805 medRxiv

Top 0.1%

9.7%

Show abstract

Attention plays a key role in decision-making by directing limited cognitive resources to relevant information. It has been proposed that attention also biases the decision process, due to a multiplicative interaction between attention and subjective value (e.g., Krajbich et al., 2010). We tested two predictions of models that posit a causal multiplicative effect of attention on decision formation: (i) the last fixation should be more informative about the choice when the overall value of the alternatives is high, and (ii) more attention should be directed to the chosen option when choices conflict with stated preferences than when they do not. Reanalyzing data from a food-choice task (Krajbich et al., 2010), we found no evidence supporting these predictions. A similar discrepancy with the data is observed in recent normative models, which propose that gaze allocation arises from a process of Bayesian inference about the latent values of the alternatives (Callaway et al., 2021; Jang et al., 2021). An alternative model where attention reflects choices after the decision has completed, explains key observations, including the last-fixation bias, the gaze-cascade effect and the effect of the overall value of the alternatives on response times. However, this model does not fully account for the association between dwell time and choice. We conclude that gaze behavior prior to the choice report likely reflects both decisional and post-decisional processes.

16

Competition Between Memory Updating and Differentiation Emerges from Intrinsic Network Dynamics

Pronoza, J.; Liedtke, N.; Boeltzig, M.; Schubotz, R. I.; Cheng, S.

2025-11-01 neuroscience 10.1101/2025.10.31.685486 medRxiv

Top 0.1%

9.1%

Show abstract

When an event occurs that is similar to a previous experience, the original episodic memory can be modified with new information (updating) or a new memory can be encoded (differentiation). Prediction errors, the deviation between expected and actual stimuli, are believed to mediate the competition between updating and differentiation, but the underlying mechanisms remain unclear. Here, we present a new analysis of experimental studies (Boeltzig, Liedtke, & Schubotz, 2025; Liedtke et al., 2025) that examine recognition memory and cued recall of similar conversations. The original version was recognized more confidently than the modified version, and the recognition confidence for modified versions showed a U-shaped dependence on the prediction error. Furthermore, the larger the prediction error, the more frequently participants retrieved two versions during cued recall. To account for these results, we propose a computational model based on a modified Hopfield network, which encodes the original and modified versions sequentially and weights the encoding of new patterns by the prediction error. The model shows that (1) similar new memories interfere with previous ones (updating) while dissimilar ones are stored separately (differentiation), (2) interference from similar representations leads to reduced memory accuracy and lower-confidence recognition, and (3) the encoding weight must be modulated by the prediction error to account for the experimental data. Our modeling results show that prediction-error-driven competition between updating and differentiation can emerge from intrinsic network dynamics alone.

17

Adaptive generalization and efficient learning under uncertainty

Park, J.; Chung, D.

2025-08-22 neuroscience 10.1101/2025.08.21.671603 medRxiv

Top 0.1%

8.5%

Show abstract

People often use recognizable features to infer the value of novel consumables. This "generalization" strategy is known to be beneficial in stable environments, such that individuals can use previously learned rules and values in efficiently exploring new situations. However, it remains unclear whether and how individuals adjust their generalization strategy in volatile environments where previously learned information becomes obsolete. We hypothesized that individuals adaptively use generalization by continuously updating their beliefs about the credibility of the feature-based reward generalization model at each state. Our data showed that participants used generalization more when the novel environment remained consistent with the previously learned monotonic association between feature and reward, suggesting efficient utilization of prior knowledge. Against other accounts, we found that individuals incorporated an arbitration mechanism between feature-based value generalization and model-based learning based on volatility tracking. Notably, our suggested model captured differential impacts of generalization dependent on the context-volatility, such that individuals who were biased the most toward generalization showed the lowest learning errors when the value of stimuli are generalized along the recognizable feature, but showed the highest errors in a volatile environment. This work provides novel insights into the adaptive usage of generalization, orchestrating two distinctive learning mechanisms through monitoring their credibility, and highlights the potential adverse effects of overgeneralization in volatile contexts.

18

The Rational Irrational: Better Learners Show Stronger Frequency Heuristics

Hu, M.; Worthy, D. A.

2025-09-18 neuroscience 10.1101/2025.09.18.676999 medRxiv

Top 0.1%

8.1%

Show abstract

Does favoring less valuable options that deliver more frequent rewards reflect flawed decision-making or an adaptive strategy under complex environments? Frequency effects, defined as a bias toward more frequently rewarded but less valuable options, have traditionally been viewed as maladaptive decision-making deficits. In the present study, we used a within-subject design in which participants completed a four-option reinforcement learning task twice, once under a baseline condition and once with a reward frequency manipulation, to test whether better baseline learning predicts greater or lesser susceptibility to frequency-based biases. Participants were first trained on two fixed option pairs and then transferred their knowledge to novel pairings in a testing phase. Across conditions, higher training accuracy generally predicted higher test accuracy, with one critical exception: on trials where a more valuable option was pitted against a more frequently rewarded but less valuable alternative, participants with higher training accuracy exhibited a stronger bias toward the more frequent option. Moreover, baseline optimal choice rates in these specific trials were unrelated to--and even slightly negatively correlated with--optimal choice rates under the frequency condition. Computational modeling further showed that participants with better baseline learning performance were better fit by frequency-sensitive models in the frequency condition and they weighed frequency-based processing more heavily than value-based processing. Overall, these findings suggest that frequency effects, rather than signaling flawed learning, manifest more strongly in individuals with better baseline learning performance. This seemingly irrational bias may, under conditions of uncertainty, reflect a flexible, adaptive strategy that emerges among the best learners when value-based approaches are costly or unreliable. Author SummaryIn daily life, people often face choices between familiar, frequently encountered options and unfamiliar alternatives that may be more valuable. For example, we may keep visiting a local restaurant we know well instead of trying a new one with better reviews. This tendency, known as the frequency effect, reflects a bias toward options that yield more frequent rewards, even when those rewards are smaller and suboptimal overall. Traditionally, such behavior has been interpreted as a sign of neuropsychological impairments or flawed learning, while our study found the opposite. We asked 495 participants to complete a reinforcement learning task under two conditions: one with balanced reward frequencies and another in which one option was rewarded more frequently despite being less valuable than its alternative. Surprisingly, we found that better learners in the balanced condition were more likely to show frequency effects when reward frequencies were manipulated and uneven. Computational modeling confirmed that these individuals shifted from value-based strategies to frequency-based ones when the environment made value-based decisions more difficult. These findings suggest that frequency effects are not simply errors. Instead, they may represent an adaptive shortcut that emerges more strongly in better decision-makers as a flexible strategy for navigating uncertain environments when value-based calculations are costly or unreliable

19

The positive evidence bias in perceptual confidence is not post-decisional

Samaha, J.; Denison, R.

2020-03-17 animal behavior and cognition 10.1101/2020.03.15.991513 medRxiv

Top 0.1%

8.1%

Show abstract

Confidence in a perceptual decision is a subjective estimate of the accuracy of ones choice. As such, confidence is thought to be an important computation for a variety of cognitive and perceptual processes, and it features heavily in theorizing about conscious access to perceptual states. Recent experiments have revealed a "positive evidence bias" (PEB) in the computations underlying confidence reports. A PEB occurs when confidence, unlike objective choice, over-weights the evidence for the chosen option, relative to evidence against the chosen option. Accordingly, in a perceptual task, appropriate stimulus conditions can be arranged that produce selective changes in confidence reports but no changes in accuracy. Although the PEB is generally assumed to reflect the observers perceptual and/or decision processes, post-decisional accounts have not been ruled out. We therefore asked whether the PEB persisted under novel conditions that eliminated two possible post-decisional accounts: 1) post-decision evidence accumulation that contributes to a confidence report solicited after the perceptual choice, and 2) a memory bias that emerges in the delay between the stimulus offset and the confidence report. We found that even when the stimulus remained on the screen until observers responded, and when observers reported their choice and confidence simultaneously, the PEB still emerged. Signal detection-based modeling also showed that the PEB was not associated with changes to metacognitive efficiency, but rather to confidence criteria. We conclude that once-plausible post-decisional accounts of the PEB do not explain the bias, bolstering the idea that it is perceptual or decisional in nature.

20

Does monetary reward increase visual working memory performance?

van den Berg, R.; Zou, Q.; Ma, W. J.

2019-09-12 animal behavior and cognition 10.1101/767343 medRxiv

Top 0.1%

8.0%

Show abstract

Previous work has shown that humans distribute their visual working memory (VWM) resources flexibly across items: the higher the importance of an item, the better it is remembered. A related, but much less studied question is whether people also have control over the total amount of VWM resource allocated to a task. Here, we approach this question by testing whether increasing monetary incentives results in better overall VWM performance. In three experiments, subjects performed a delayed-estimation task on the Amazon Turk platform. In the first two experiments, four groups of subjects received a bonus payment based on their performance, with the maximum bonus ranging from $0 to $10 between groups. We found no effect of the amount of bonus on intrinsic motivation or on VWM performance in either experiment. In the third experiment, reward was manipulated on a trial-by-trial basis using a within-subjects design. Again, no evidence was found that VWM performance depended on the magnitude of potential reward. These results suggest that encoding quality in visual working memory is insensitive to monetary reward, which has implications for resource-rational theories of VWM.